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Abstract 

Co-authorship graphs (that is, the graph of authors linked by co- 
authorship of papers) are complex networks, which expresses the dynamics 
of a complex system. Only recently its study has started to draw inter- 
est from the EC community, the first paper dealing with it having been 
published two years ago. In this paper we will study the co-authorship 
network of EC at a microscopic level. Our objective is ascertaining which 
are the most relevant nodes (i.e. authors) in it. For this purpose, we 
examine several metrics defined in the complex-network literature, and 
analyze them both in isolation and combined within a Pareto-dominance 
approach. The result of our analysis indicates that there are some well- 
known researchers that appear systematically in top rankings. This also 
provides some hints on the social behavior of our community. 

Keywords: Social networks, co-authorship networks, scientometrics, 
sociology of science, evolutionary computation, eigenvalues, Pareto front 

1 Introduction 

Academy, as any other human endeavor, is a complex adaptive system (CAS), 
and looking at some of its aspects reflect that fact. Since one of the outstanding 
(and measurable) aspects of academy is publishing, it is interesting to study it 
to find out how general CAS mechanisms apply to it, and create general models 
of the system. 

One of the possible ways to study this publishing activity is to look at co- 
authorship graphs, where nodes (or actors) correspond to authors, joined by 
an edge if they have been coauthors in a paper. This is a non-directed graph, 
which does not take into account the authorship order, and, besides, considers 
that all signing authors are actually authors: following Yoshikane et al. [24j . we 
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consider that, in general, this assumption is true. Co-authorship graphs have 
been studied for a long time, starting with Kretschmer [16], but they started 
to be recognized as complex networks with the work of Newman [T^J [20] and 
Barabasi [T] , showing they followed power-laws [T5] (which might correspond to 
a preferential attachment growth) and also behaved as a small world [T5] . 

Even as the general framework has been already laid out, there are still 
a few open issues. Measurements for a particular field, such as evolutionary 
computation [7], have to be made, and the evolution of its graph followed [5]. 
This evolution reflects the differential authoring mechanisms in particular fields, 
and these mechanisms can be modelled. Besides, within every field, finding out 
sociometric stars reflect the knowledge flow within it and its fertility. Synchronic 
networking (co-authorship relations) are also related to diachronic networking 
(citing or co-citing relations), and, thus, it is also interesting for knowledge 
discovery within a particular field. 

Another open issue is exactly what to measure in that network. Looking 
at a single measure will yield a partial view of the network. While there is 
a high correlation among some measures (such as betweenness and closeness; 
definitions will follow later), they reflect different aspects of the network and, 
thus, they will have to be taken into account globally when making a ranking 
of the sociometric stars of the graph. In this paper, each actor will be assigned 
a vector of quantities, and the ranking will be done according to the concept 
of Pareto front [T3], that is, the set of non-dominated authors. Identifying the 
key actors in our field not only provides some objective metric with which our 
subjective perception can be contrasted, but it can be also helpful in order to 
understand some of the patterns of social behavior at work in our community. 

The rest of the paper is organized as follows: next we present a brief state 
of the art on the subject. The resources and methodology used in this paper 
are presented in section [3] and the results of the analysis in section [4] Finally 
we will draw some conclusions in section [5] 

2 State of the art 

Coauthorship studies have generally focused in macroscopic measures of partic- 
ular scientific communities: computer support of cooperative work [13] , psychol- 
ogy and philosophy [pj, chemistry [ID] . SIGMOD authors [TB] and sociology [T7] . 
but some other authors [5] have analyzed the topological properties of these net- 
works in general, looking at a particular preprints database (cond-mat). They 
have found that betweenness and degree follow a long-tailed degree distribution, 
which is usually to be expected, but it is interesting to prove that it actually 
happens for a representative sample. 

Although all coauthors are considered indistinctly insome cases the roles of 
different authors [24] is taken into account, although their focus is on visualiza- 
tion of relations among authors, not on a differential analysis of the different 
positions, or on the study of the internal structure of the cliques formed via 
co-authorship. 
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Another approach to the study of scientific communities is to understand the 
role of different actors by measuring certain microscopic (node-based) quanti- 
ties; centrality (see [3]) is one of them, although its definition is not trivial. For 
example: is an actor more central when it is more visible, or more influential, 
or more powerful? Furthermore, how can visibility, influence, and power be 
defined in an objective sense? ft should be taken into account that intuition 
can be misleading here. As an example, consider that one of the first (intuitive) 
principles that were proposed is that centrality grows monotonically with the 
number of ties, and that adding ties (edges) should increase one node's central- 
ity [22]. While these ideas look attractive and intuitive, they do not provide a 
satisfactory definition of centrality, since the importance of a certain node can 
however be diminished when other node gets more ties. Freeman [12] gave an 
answer to this issue, reviewing a number of published measures, and identified 
three basic concepts for defining centrality: degree, closeness, and betweenness. 
In this canonical formulation, these three measures have maximum values when 
the network is star-shaped, hence providing a proper characterization of cen- 
trality. Borgatti [2] has also elaborated on this issue, considering the dynamic 
flow all over the network, and how often traffic flows through a certain node, or 
how long does it take to get to a certain node. 

The amount of papers on this area indicates that there is a lot of work to 
be done, be it for a particular area or discipline, or on the visualization and 
methods front. In this paper, we try to use a new method -based on Parcto 
dominance- to examine the sociometric ranking in a particular field taking into 
account several centrality features at the same time. 

3 Resources and methodology 

The bibliographical data used for the construction of the scientific-collaboration 
network in EC has been gathered from the DBLP -Digital Bibliography & Li- 
brary Project- bibliography server, maintained by Michael Ley at the University 
of Trier. This database provides bibliographic information on major computer 
science journals and proceedings, comprising more than 830,000 articles and 
several thousand computer scientists (by the end of 2006). We have defined a 
collection of terms that include the acronyms of EC-specific conferences -such 
as GECCO, PPSN or EuroGP- or keywords -such as "Evolutionary Compu- 
tation" , "Genetic Programming" , etc.- that are sought in the title or in the 
publication forum of papers. Using an initial sample of authors (those that have 
published at least one paper in the last five years in any of the following large 
EC conferences: GECCO, PPSN, EuroGP, EvoCOP, and Evo Workshops), their 
list of publications is checked for relevance, and the corresponding co-authors 
are recursively examined. Just as an indication of the breadth of the search, the 
number of authors used as seed is 3,773 whereas the final number of authors in 
the network is 7,712, that is, more than twice as many. 

The macroscopic measures of the network obtained through this procedure 
are shown in [1] comparing them with measures taken from a CS repository 
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Table 1: Summary of results of the analysis of a computer science collabora- 
tion network (NCSTRL), the previous analysis of the EC co-authorship network 
-EC05-, taken from [7]. Measures for this paper are shown in the middle col- 
umn, and were taken during November 2006. Data for NCSTRL is taken from 
Newman [21| . 





EC05 


EC06 


NCSTRL 


total papers 


6199 


8501 


13169 


total authors 


5492 


7712 


11994 


mean papers per author 


2.9 


2.87 


2.6 


mean authors per paper 


2.56 


2.60 


2.22 


collaborators per author 


4.2 


4.02 


3.6 


size of the giant component 


3686 


4804 


6396 


as a percentage 


67.1% 


62.3% 


57.2% 


2nd largest component 


36 


106 


42 


clustering coefficient 


0.798 


0.811 


0.496 


mean distance 


6.1 


10.9 


9.7 


diameter 


18 


21 


31 



(NCSTRL, data taken from [21], and historical data on what is basically the 
same network, taken from [5J [7] . 

The first and obvious observation is that there has been a progression from 
2005 to 2006: more than two thousand new authors, but it is also interesting to 
see that many of these authors have gone to the main component. This increase 
in the number of authors probably account for the increase in the diameter, 
that goes from 18 to 21, a small increase which shows again the small world 
characteristic of this network. Metrics such as the clustering coefficient, the 
average number of collaborators per author or authors per paper are quite close, 
with a very small variation, which indicates that collaboration patterns continue 
in the same way. We will analyze in next section the microscopic features of the 
network, and in particular who the most prominent nodes are. 

4 Sociometric stars 

Centrality can be measured in multiple ways. We are going to focus firstly 
on metrics based on geodesies, i.e., the shortest paths between actors in the 
network. These geodesies constitute a very interesting source of information: 
the shortest path between two actors defines a "referral chain" of intermediate 
scientists through whom contact may be established - cf . [5T] . It also provides a 
sequence of research topics (recall that common interests exist between adjacent 
links of this chain, as defined by the co-authored papers) that may suggest future 
joint works or lines of research. 

The first geodesic-based centrality measure we are going to analyze is be- 
tweenness |llj . i.e., the relative number of geodesies between any two actors j, k 
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(jfcdjk) passing through a certain i (^gjik), summed for all j, k: 

This measure is based on the information flow between actors: when a joint 
paper is written, the authors exchange lots of information (such as knowledge of 
certain techniques, research ideas, potential development lines, or unpublished 
results) which can in turn be transmitted (at least to some extent) to their 
colleagues in other papers, and so on. Hence, actors with high betweenness 
are in some sense "hubs" that control this information flow; they are recipients 
-and emitters- of huge amounts of cutting-edge knowledge; furthermore, their 
removal from the network results in the increase of geodesic distances among a 
large number of actors t 23| . 



Table 2: Top ten actors according to their betweenness. 

Name bctweenness 



1 


D.E. Goldberg 


2194962 


2 


K. Deb 


1861389 


3 


M. Schoenauer 


1479185 


4 


H. de Garis 


1246007 


5 


Z. Michalewicz 


1144581 


G 


X. Yao 


1060389 


7 


R.E. Smith 


928108 


8 


M. Tomassini 


921023 


9 


T. Back 


818897 


10 


K.A. De Jong 


772788 



Table[2]shows the top ten actors according to this centrality measure. Notice 
how the betweenness values decrease abruptly from one actor to the next. There 
is clearly a power law at work (actually, a power law with exponential cutoff), as 
is shown in Figure [TJ This scaling is consistent with the presence of a hierarchy 
of hubs in the network. Whenever a shortest path is sought between two nodes, 
the nearest common ancestor in the hierarchy is used. Top actors in this ranking 
are thus those located in the center of gravity of the network, connecting distant 
regions of the latter. 

The second centrality measure we are going to consider is precisely based on 
this geodesic distance. Intuitively, the length of a shortest path indicates the 
number of steps that research ideas (and in general, all kind of memes) require 
to jump from one actor to another. Hence, scientists whose average distance 
to other scientists is small are likely to be the first to learn new information, 
and information originating with them will reach others quicker than informa- 
tion originating with other sources. Average distance (i.e., closeness) is thus a 
measure of centrality of an actor in terms of their access to information. 
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Figure 1: Logarithmic plot of betweenness vs. rank for the EC co-authorship 
network. There is a power law up to position #500, with an exponential cutoff 
after that, ft is quite usual for social networks to have a power law in these 
quantities. 



Table 3: Top ten actors according to their closeness. 



Name 



closeness 



1 

2 
3 
4 
5 
G 
7 
8 
9 
10 



K. Deb 

Z. Michalewicz 



4.1886571e-005 
4.1091387e-005 
4.0731538e-005 
4.0731538e-005 
3.9521005e-005 
3.9502271e-005 
3.8568343e-005 
3.8433452e-005 
3.8358266e-005 
3.8165026e-005 



D.E. Goldberg 



M. Schoenauer 



B. Paechter 
A.E. Eiben 
D.B. Fogel 
H.-G. Beyer 
H.A. Abbass 



M. Tomassini 



Table [2] shows the top ten actors according to this centrality measure, ex- 
pressed here as the reciprocal of farness, that is, the sum of the lengths of 
geodesic paths (dij) from a node to every other one: 

C? LO = ^ ■ (2) 

Notice that the differences in closeness values are not so marked as for be- 
tweenness, i.e. there is no power law acting here. The fact that our network 
is a small world contributes to this. Notice also that the names appearing in 
this ranking are very similar, although not identical, to the ranking yielded by 
betweenness; there are actually five well-known researchers (K. Deb, D.E. Gold- 
berg, Z. Michalewicz, M. Schoenauer, and M. Tomassini) showing up in both 
rankings, in slightly different places. 

In addition to the centrality measures based on geodesies, there exist an 
interesting group of metrics based on degree, that is on the number of ties each 
actor has. One of them is Bonacich's index, also called power [2, 4 . In social 
networks, this power is related to the possibility of going (originally negotiating) 
from one to another actor in the network using all possible paths. Lots of paths 
imply lots of options, which, in turn, imply many different ways of negotiating 
with or influencing another actor in the network. Notice that this index can 
be also interpreted in the following way: one's power is higher is one has many 
connections, and even more if these connections have high power too. Actually, 
one of the methods for determining this index is finding the fixed point of a 
linear combination of one's degree and one's neighbors' power. The coefficients 
of this linear combination have to be chosen so that the procedure converges 
(this can be ensured by picking a value lower than the dominant eigenvalue of 
the adjacency matrix [4]), and the resulting values are adequately normalized 
(in our case, the average squared power is 1.0): 




(3) 
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where A is the adjacency matrix. Results for this measure, presented for the 
first time for the EC network in this paper, are shown in table [4] 



Table 4: Top ten actors according to their power (Bonacich power) 



Name power 



1 D.E. Goldberg 1.43818 

2 M. Schoenauer 1.29379 

3 K. Deb 1.18711 

4 D. Keymeulen 1.05472 

5 X. Yao 1.04514 

6 L.D. Whitley 1.04259 

7 T. Back 0.93871 

8 T. Higuchi 0.93698 

9 H. de Garis 0.91823 
10 L. Kang 0.89615 



There are many names in common with the other rankings, but the inter- 
esting part is precisely those that are not in common, specially D. Keymeulen. 
D. Keymeulen works in evolvable hardware, and is a coauthor with T. Higuchi, 
who is also a new name in this ranking; in turn, Dr. Higuchi is coauthor with 
H. de Garis and X. Yao, which accounts for the high values of this measure for 
them. H. de Garis has also coauthored with L. Kang, explaining out the rest 
of the ranking (it must be added that L. Kang is coauthor of Z. Michalewicz, 
who is the 11th in the ranking). Besides, L. Kang and T. Higuchi are doors 
to huge national communities (Chinese and Japanese), over which they have 
power. It is not surprising to find the actors in this ranking separated by just 
a few handshakes, or edges. In fact, there are several high-ranking kernels, with 
each sociometric figure being a door or center to a few of them. 

A related approach to the previous one follows when a = 0. In that case, 
an actor's power is simply 



that is, the vector of centrality values is an eigenvector of the adjacency matrix, 
and (3 must be the reciprocal of an eigenvalue. It is customary to pick the cen- 
trality vector associated with the largest eigenvalue. In this case, the resulting 
ranking is shown in Table [5] 

As it can be seen, the dominant eigenvector gravitates around D. Keymeulen, 
who is the actor with 6th- highest degree (50 coauthors). His role is enhanced 
by his collaboration con T. Higuchi (and vice versa). The remaining actors in 
the top ten happen to be coauthors of these two researchers, and hence their 
presence can be partly attributed to "hitchhiking" . 

It is evident from the inspection of the above tables that each of the measures 
tells a different part of the story. Although there is some correlation among 
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Table 5: Top ten actors according to their eigenvector centrality 
Name score 



1 

2 
3 
4 
5 
6 
7 
8 
9 
10 



D. Keymeulen 
T. Higuchi 
M. Iwata 

I. Kajitani 
N. Kajihara 
M. Murakawa 

E. Takahashi 
H. Sakanashi 
N. Otsu 

M. Salami 



0.358 
0.312 
0.255 
0.249 
0.221 
0.205 
0.169 
0.166 
0.154 
0.153 



them, this correlation is not perfect, and hence the rankings differ. It makes 
then sense to think about possible ways of combining this information. This is 
a standard issue in multi-objective optimization, and we can think in the notion 
of Pareto-dominance as a means to achieve a global perspective on the centrality 
status of the different actors. This is what we do next. 

First of all, we have performed pairwise combinations of the previous central- 
ity measures, to determine the extent to which they can be said to be correlated. 
Figures [H [21 and 2] show scatter plots of all six possible combinations, and in- 
dicate the corresponding non-dominated front (notice that centrality values are 
always maximized). Notice firstly in Figure [5] and Figure [3] (left) that Bonacich 
power, closeness, and betweenness are fairly well correlated. The tail of the rank- 
ings is rather wide, but a clear correlation is observed for top actors. Indeed, 
there are just two actors appearing in the non-dominated fronts: D.E. Goldberg 
(thrice) and K. Deb (twice). The former is actually the global dominating actor 
for power vs. betweenness. When eigenvector centrality is introduced, the situ- 
ation is different: centrality gravitates for this measure around a different set of 
actors, and this is specifically clear in Figure [3] (right) and Figure 2] The corre- 
lation is now much more questionable, and as a consequence the non-dominated 
fronts tend to be wider. 

Next, we consider the non-dominated front arising from the simultaneous 
optimization of all four centrality measures. In this case, the resulting actors 
are K. Deb, H. de Garis, D.E. Goldberg, T. Higuchi, D. Keymeulen, and X. 
Yao. If we go to the second non-dominated front (i.e., the front that would 
result from the removal of actors in the previous front), we get T. Back, C. 
Coello Coello, D.B. Fogel, T. Hoshino, H. Iba, M. Iwata, L. Kang, J. Li, Y. Liu, 
Z. Michalewicz, M. Schoenauer, and R. Salem. We can easily identify highly 
recognized researchers in this list, as well as some actors raised to a prominent 
status due to eigenvector hitchhiking. To see the impact of this fourth centrality 
measure, we have also recomputed the global fronts considering just closeness, 
betweenness, and power. The result is the following: 
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Figure 2: Scatter plot of centrality values under different measures. (Left) 
closeness vs. betweenness, (right) closeness vs. power. 
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Figure 3: Scatter plot of centrality values under different measures. (Left) power 
vs. betweenness, (right) power vs. eigenvector. 
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Figure 4: Scatter plot of centrality values under different measures. (Left) 
closeness vs. eigenvector (right), eigenvector vs. betweenness. 
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• Front #1: K. Deb, D.E. Goldberg 



• Front #2: Z. Michalewicz, M. Schoenauer 

• Front #3: T. Back, A.E. Eiben, H. de Garis, D. Key-meulen, B. Paechter, 
M. Tomassini, X. Yao 

• Front #4: D.B. Fogel, J.J. Merelo, T. Higuchi, K.A. De Jong, L. Kang, 
E. Lutton, R.E. Smith, L.D. Whitley 

• Front #5: H.A. Abbass, H.-G. Beyer, J. Branke, M. Dorigo, T.C. Foga- 
rty, H. Iba, M. Keijzer, E.G. Talbi, M.D. Vose 

Several things must be noted. Firstly, the top two actors for eigenvector 
centrality also have prominent places in these fronts, showing that the fact that 
they were spotted by the former centrality measure is not a spurious effect, 
but a solid indicator of their relevance in the network structure. Secondly, all 
researchers appearing in these fronts are very well-known in the field for their 
research excellence. Their appearance in one front or another does not represent 
therefore a scientific ranking, but a measure of their connectedness under three 
different measures. 

5 Conclusions 

Centrality analysis is fundamental in order to grasp the microscopic structure 
of a network, and understand the mechanisms governing its temporal evolution. 
Consider that oui0 current network is the result of the incremental addition of 
ties through the years. In this sense, two reflections can be made: the first one 
is that central nodes owe their prominence to the fact that they are strategically 
located within the network structure, and this indicates that the growth of the 
network somehow gravitates around them. Secondly, and related to the previous 
fact, centrality is a dynamic property that changes with time. An asymmetric 
growth of the network may displace one actor from a relatively central position, 
or a highly active actor may become with time one of the most relevant hubs of 
the network. The analysis presented in this work must thus be interpreted as a 
snapshot of the situation. 

There are two aspects to be highlighted in this work. The first one refers to 
the methodology used. To the best of our knowledge, this is the first time that a 
multi-objective approach has been used to characterize centrality using different 
measures. We believe this is the natural approach that should be followed in 
this kind of studies, since each measure provides a different (despite some obvi- 
ous correlation) perspective on the relevance of each actor. The second aspect 
is the actual results obtained. We have identified a group of researchers (most 
prominently K. Deb and D.E. Goldberg, but many others can be cited too), 
that systematically appear both in the top rankings under different measures 

1 Please note that the two authors are part of this community 
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and in the corresponding non-dominated fronts. This gives an objective picture 
of who the best connected EC scientists were and are (recall that there is an 
inherent historical component in centrality). This also gives hints on the current 
"hot" points of activity, and indirectly (via an examination of the correspond- 
ing research subareas) on the current "hot" topics. Related with these previous 
issues, and in particular with the historical aspect of centrality, we have stud- 
ied elsewhere [8] the temporal evolution of the macroscopic properties of the 
collaboration network. It would be very interesting, and it actually constitutes 
one of our priorities, to conduct a temporal analysis of centrality identifying 
the trajectories of the most relevant actors, and -if the trends are clear enough- 
even forecasting future sociometric stars. 
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